AITopics | neural temporal-difference learning converge

Neural Temporal-Difference Learning Converges to Global Optima

Neural Information Processing SystemsDec-25-2025, 18:28:12 GMT

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD. Beyond policy evaluation, we establish the global convergence of neural (soft) Q-learning, which is further connected to that of policy gradient algorithms.

global convergence, name change, neural temporal-difference learning converge, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Neural Temporal-Difference Learning Converges to Global Optima

Neural Information Processing SystemsMay-27-2025, 13:42:52 GMT

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD. Beyond policy evaluation, we establish the global convergence of neural (soft) Q-learning, which is further connected to that of policy gradient algorithms.

global convergence, global optima, neural temporal-difference learning converge, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Neural Temporal-Difference Learning Converges to Global Optima

Neural Information Processing SystemsJan-25-2025, 22:11:34 GMT

Originality: The paper relies on recent results on the implicit local linearization effect of overparametrized neural networks in the context of supervised learning, and on recent nonasymptotic analysis of Linear TD and Linear Q-learning. Perhaps the main insight is the relationship between the explicit linearization of Linear TD and the implicit linearization of overparametrized neural TD. Related work is properly referenced. Quality: The paper seems to be technically sound (although I have just skimmed over the proofs). The convergence of the three algorithms, namely Neural TD, Neural Q-learning, and Neural Soft Q-learning constitute a complete piece of work.

architecture, global optima, neural temporal-difference learning converge, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Neural Temporal-Difference Learning Converges to Global Optima

Neural Information Processing SystemsJan-25-2025, 22:11:23 GMT

The reviewers seem to be in agreement that this is above the acceptance threshold.

global optima, neural temporal-difference learning converge

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Neural Temporal-Difference Learning Converges to Global Optima

Neural Information Processing SystemsOct-10-2024, 13:47:47 GMT

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD. Beyond policy evaluation, we establish the global convergence of neural (soft) Q-learning, which is further connected to that of policy gradient algorithms.

global convergence, global optima, neural temporal-difference learning converge, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Neural Temporal-Difference Learning Converges to Global Optima

Cai, Qi, Yang, Zhuoran, Lee, Jason D., Wang, Zhaoran

Neural Information Processing SystemsMar-19-2020, 01:16:25 GMT

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD.

global convergence, global optima, neural temporal-difference learning converge, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

neural temporal-difference learning converge

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Neural Temporal-Difference Learning Converges to Global Optima

Neural Temporal-Difference Learning Converges to Global Optima

Reviews: Neural Temporal-Difference Learning Converges to Global Optima

Reviews: Neural Temporal-Difference Learning Converges to Global Optima

Neural Temporal-Difference Learning Converges to Global Optima

Neural Temporal-Difference Learning Converges to Global Optima